Method and system for performing classified document research

ABSTRACT

A system and method for efficiently and accurately identifying relevant document classifications is contemplated. The document analysis system receives classified reference documents along with a relevancy indicator for each document and generates sensory indicators that assist a researcher in identifying relevant classifications that have not been previously researched. In one aspect, the document analysis system generates a table of classifications, the classifications being determined by scoring of each classification cited within each relevant document. The system then determines a sensory indicator (e.g. a color) for each classification that indicates the extent to which the classification has been previously searched. The classification analysis window thus allows the researcher to quickly determine (e.g. by visual inspection) which classification codes have been cited most frequently as well as which classification codes require further search. In this manner the researcher may quickly determine where to direct a next iteration of a search.

RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional PatentApplication Ser. No. 61/250,557 filed on Oct. 11, 2009 by the inventorof the present invention, the entire contents of which are incorporatedherein by reference.

FIELD OF THE INVENTION

The present invention relates to the field of document research, andmore particularly to methods and systems for locating relevantclassifications.

BACKGROUND

Document research involves finding relevant subject matter within a setof documents as may be found in a document repository. Search engines,for example, use “key” words or phrases as search arguments to locatetext passages containing those words or phrases. Classification systemsprovide another means for assessing context. In a classification system,documents with common threads are grouped together in classes. A fieldof context, therefore, can be narrowed by selecting relevant classes.Patents and patent-related documentation databases are examples ofdatabase repositories that implement classification systems. The mostcommonly used classification system for patents and published patentapplications, at least in the U.S., is the USPTO (United States Patentand Trademark Office) Patent Classification System. Two otherclassification systems in common usage on the international sceneinclude: the “IPC” (International Patent Classification) and the “ECLA”(European Classification).

Documentation classifications systems provide a means for improving theproductivity of a document researcher. However, in many large-scaledatabases the classification system itself may be complex. Patents andpatent-related documentation databases provide examples of suchlarge-scale classified database systems with corresponding complexclassification systems. The USPTO classification system currentlycomprises at least 984 classes and numerous digests (collections ofcertain subjects) within each class. Each class is broken intosubclasses; each subclass may be further broken into subclasses and soon. Patents are thus grouped into categories, which are broken down intosub-categories, and sub-categories into more sub's, as required. TheUSPTO examiners decide the class/subclass in which to file a particularinvention. To add further complexity, any one invention can be filed inmore than one class/subclass, and most are filed in severalclasses/subclasses.

The challenge of performing document research in such a large-scaledocument repository, therefore is to develop an experiencedunderstanding of the classification system. Existing classificationanalysis tools provide some assistance in navigating classification. Seefor example U.S. Pat. No. 7,333,984 to Oosta. A counting and sortingtechnique is shown in FIG. 8. However, the analysis is broad, and doesnot show a researcher where he or she needs to search, which isimportant because a patent search involves many iterations of polling adatabase, and with each iteration, the researcher should progressivelynarrow the size of the field of interest.

U.S. Patent Application 20020022974 to Lindh shows a method for displayof patent information that involves applying statistical analysis togroups of references containing classifications. Lindh does not showadditional cross referencing to a search history in order to locateunsearched classifications, which again is important in progressivelynarrowing and focusing a search.

U.S. Patent Application 20090313221 to Chen shows a patent technologyassociation classification method. While Chen has shown the method ofremoving classifications and counting frequency, Chen fails to show theadditional function of comparing classification frequencies to searchhistories, nor does Chen show additional broad and narrow reportingschemes for use at different stages of a patent search.

U.S. Patent Application 20080228724 to Huang et al. seek to assist aresearcher in performing classification-based research. Huang shows atechnical classification method for searching patents, which includesgenerating counts from a group of references. The method shows theresearcher a quality of a search, but falls short in that Huang does notassist the researcher in locating additional classification areas tosearch in a next iteration.

U.S. Patent Application 20020073095 to Ohga shows a patentclassification displaying method and apparatus having some similaritiesto the present invention. As seen in FIG. 4, the apparatus provides aclassification counting system, wherein the most frequently occurringcodes are sorted to the top of the list. Other systems, such as ThompsonDelphion, have reporting features like this. Several critical componentsare however missing when viewed next to the present invention. First,the classification codes on the report should be cross referencedagainst a running tally of codes kept by the researcher in a givensearch project. With this additional function, the researcher sees notonly relevant classifications, but also classifications that have notbeen searched yet. In addition, Ohga fails to show additional modes ofclass counting and weighting that are used at different stages of apatent research project, such that the researcher can use broad analysisin the beginning and narrow analysis during the iterative part of thepatent search.

U.S. Patent Application 20010027452 to Tropper shows a system and methodto identify documents in a database which relate to a given document byusing recursive searching and no keywords. While Tropper realizes thebenefits of using latest search results to form new searches, he failsto teach the accumulation of classification codes, weighting the codes,ranking of the codes and then comparing the rankings to the researcherssearch history.

A need thus exists for an improved classification analysis system, notonly for the less-experienced document researcher, but also for theefficiency of those with established skill and experience with aparticular classification system. Embodiments of the present inventionaddress many of the shortfalls in the prior art while presenting, whatwill hereinafter become apparent to be, a pioneering document analysistechnology.

BRIEF SUMMARY OF THE PRESENT INVENTION

It is a first object of the present invention to provide aclassification analysis system that equips a researcher with broad scopereporting for the initial phase of a search project. It is a secondobject to enable the researcher to progressively narrow the scope of thesearch project. Yet another object of the present invention is to enablethe researcher to track a classification search history such thatduplication is avoided. Still another object of the present invention isto provide a system of narrow classification analysis cross referencedagainst the classification search history. Yet another object of thepresent invention is to enable the researcher to effectively cyclethrough the narrow phase of a search project. Still another object ofthe present invention is to provide a system that permits the researcherto confidently end a classification based search project.

The present invention provides a system and method for efficiently andaccurately identifying relevant document classifications. The systemreceives one or more classified reference documents in a document setalong with a relevancy indicator for each document. The system retrievesall document classifications from the document set, and arranges aclassification analysis interface. The researcher has four modes for theinterface, which are called: Main, Parents, Subclass, and Primarymode—wherein Main is the broadest and Primary is the narrowest. Theresearcher is provided GUI tools to select classification codes from theclassification analysis interface, and add them to a classificationsearch history which is stored along with the document set in a projectfile.

In use, the researcher uses the Main and the Parents mode during thefirst hour of the search project, and the Subclass mode for theremaining 3-4 hours. In the Main mode, the researcher is shownoccurrence of main classes in the document set, which provides a broadbase for class/text searching. In Parents mode, the researcher is showncommon occurrence of parent sub-classifications of the documentclassifications, while the document classifications are not shown. Withthis information, the researcher can inspect child classifications ofthe parents in a classification schedule. For the bulk of the searchproject, the researcher uses the Subclass mode. In the Subclass mode,the document classifications are collected, counted, scored, andsorted—providing the researcher quick viewing of potentially relevantclassifications. Once the researcher locates potentially relevantclassifications, he or she executes searches in the newly locatedclassifications, and then adds documents along with relevancy indicatorsto the expanding document set. The researcher then re-executes SubclassMode classification analysis on the document set. The classificationanalysis module scores classification codes and then cross referencesagainst the classification search history. The resulting classificationanalysis interface is displayed along with various sensory indicators(e.g. a color) that show the researcher relevant classifications thatare 1) un-searched, 2) partially searched, or 3) fully searched. In thismanner the researcher may quickly determine where a next iteration inthe search project should be directed. The researcher may continuouslyiterate through the process of locating new classification areas,searching the new classification areas, augmenting the document set withnew documents, and then using the classification analysis tool to locateadditional unsearched classification areas. The researcher is encouragedto add many (ie. 50-100) documents to the project file using a documentmanagement interface to tag even moderately relevant documents for thepurpose of utilizing many hundreds of classification codes in thescoring. The process continues until the top 5-10 classificationspresented by the classification analysis interface are indicated asfully searched, at which point the search project can be brought to aclose. With the present invention, important classification areas arevery difficult to overlook, regardless of the experience level of theresearcher.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A is a block diagram illustrating a document research system inaccordance with an exemplary embodiment of the invention.

FIG. 1B is a sample of a document.

FIG. 2A is an interface diagram in accordance with an exemplaryembodiment of the invention.

FIG. 2B is an interface diagram in accordance with an exemplaryembodiment of the invention.

FIG. 2C is an interface diagram in accordance with an exemplaryembodiment of the invention.

FIG. 2D is an interface diagram in accordance with an exemplaryembodiment of the invention.

FIG. 2E is an interface diagram in accordance with an exemplaryembodiment of the invention.

FIG. 2F is an interface diagram in accordance with an exemplaryembodiment of the invention.

FIG. 2G is an interface diagram in accordance with an exemplaryembodiment of the invention.

FIG. 2H is a diagram of a project file created and used by the presentinvention.

FIG. 2I is an interface diagram in accordance with an exemplaryembodiment of the invention.

FIG. 2J is an interface diagram in accordance with an exemplaryembodiment of the invention.

FIG. 3A is a flow diagram illustrating a process that may be carried outin accordance with the exemplary system of FIG. 1A.

FIG. 3B is a flow diagram illustrating a process that may be carried outin accordance with the exemplary system of FIG. 1A.

FIG. 3C is a flow diagram illustrating a process that may be carried outin accordance with the exemplary system of FIG. 1A.

FIG. 3D is a search indicator to sensory indicator color scheme table.

FIG. 4 is a block diagram illustrating a document analysis system inaccordance with another exemplary embodiment of the invention.

FIG. 5 is a flow diagram illustrating a process that may be carried outin accordance with the exemplary system of FIG. 1A.

DETAILED DESCRIPTION

Reference will now be made in detail to the present exemplaryembodiments of the invention, examples of which are illustrated in theaccompanying drawings.

Referring to FIG. 1A, a block diagram is shown illustrating a documentanalysis system 1000, or search system, in accordance with an exemplaryembodiment of the invention. The document analysis system 1000 comprisesa client device 1010, which may be a computer. The client device 1010includes a classification analysis module 1012, an interface module 1014and a user Input/Output (I/O) interface 1018. By way of example, theclient device 1010 may be a computing device having a processor such aspersonal computer, a phone, a mobile phone, or a personal digitalassistant. The document analysis system 1000 may also comprise adocument provider 1030, a classification data provider 1040 and anetwork 1020. The document provider 1030 is configured to deliver one ormore documents, labeled generally as 1032. By way of example, thedocuments 1032 may be electronic files containing patent data or anytype of electronic file that contains textual data. See FIG. 1B for anexample of a document 1032. As seen, the document 1032 has multipledocument classifications 135 that are further divided into a class 136and a subclass 137. In addition, notice the body of the document iscomposed of multiple sections (eg. Abstract, description, claims), andthat section are further divided into paragraphs 138. The textual dataof each document 1032 includes content data and one or moreclassification codes 135. The document provider 1030 may be a remoteserver and may also include a search engine 1034 for retrieving the oneor more documents 1032 from a document data repository (not shown) basedon a search query. By way of example, the search engine may be thatprovided by the United States Patent and Trademark Office (USPTO)FreePatentsOnLine, Micropatent®, Delphian®, PatentCafe®, ThompsonInnovation or Google®. The document provider 1030 may retrieve thedocument data from a local repository or from one or more remotedocuments repositories. Examples of such a document repository includepatent databases including those provided by EP (European patents), WO(PCT publications), JP (Japan abstracts) and DWPI (Derwent World PatentIndex for patent families). Moreover, the document provider 1030 may becloud based bulk storage system, such as Amazon Simple Storage Service.

The classification data provider 1040 is configured to provide access toa classification data repository 1042. The classification datarepository 1042 may be a database or file storage element that storeshierarchical classification data entries 1044. Each classification dataentry 1044 includes a classification code. Each classification dataentry 1044 may also include a classification code description field. Theclassification data provider 1040 may be a remote server provided by theUnited States Patent and Trademark Office (USPTO). The classificationdata may be representative of a document classification system such asthe Manual of Classification issued by the USPTO. The classificationdata provider may retrieve the document data from a local repository orfrom one or more remote documents repositories. It is noted that whileshown as separate components, the document provider and classificationdata provider may be co-located on a single remote server.

The interface module 1014 is configured to receive one or more documents1032 from the document provider 1030, and to retrieve classificationdata 1044 from the classification data provider 1040 by way of network1020. By way of example, the network may be the Internet. The interfacemodule 1014 may alternatively be configured to receive the documents1032 or classification data 1044 through the user I/O interface 1018. Insuch an embodiment, the documents 1032 may be stored on a portablestorage device (not shown) such as a CD, DVD or solid state device andthe user I/O interface 1018 may include a communications interface suchas a wireless interface, a CD/DVD drive or a USB drive for retrievingdata from the personal storage device. The documents 1032 mayalternately be paper-based documents and may be provided to theinterface module 1014 by use of a scanner (not shown) that is configuredwith the I/O interface 1018. The client device 1010 may also include adata storage element 1016, which may be at least one of a computerreadable medium and a memory. The interface module 1014 may also beconfigured to receive a set of one or more concepts from a researcher byway of the I/O interface 1018. The I/O interface 1018 may also includeat least one input device such as a keyboard, mouse, microphone or atouch screen for receiving the concepts from the researcher. Eachconcept is comprised of one or more text-based keywords or sets oftext-based keywords which are used to determine the relevancy each ofthe documents 1032. The client device 1010 may alternatively include adocument analysis module that generates statistical data based on theuser-defined concepts and the documents 1032. The statistical data maybe used by the researcher to quickly assess the relevancy of eachdocument 1032 to each of the user-defined concepts. The documentanalysis module may transmit the statistical data to the interfacemodule 1014 which presents the data to the researcher by way of the I/Ointerface 1018. The I/O interface 118 may also include a display such asan LCD or CRT monitor configured to display a graphical user interface(GUI) for presenting information such as the statistical data to theresearcher. The GUI will now be discussed in greater detail.

Referring now to FIG. 2A, FIG. 2B, and FIG. 2C, FIG. 2D, FIG. 2E, FIG.2F and FIG. 2G, diagrams are shown illustrating a document analysis GUI200 in accordance with an exemplary embodiment of the invention. FIG. 3Awhich illustrates an exemplary method 1300 for performingclassification-based document analysis will also be discussed. At afirst step labeled as 1310, the interface module 1014 may receiveconcept data from the researcher. The interface module 1014 firstgenerates a document analysis GUI 200 and displays the GUI 200 to theresearcher by way of the display device included with user I/O interface1018. As shown in FIG. 2A, the document analysis GUI 200 includes adocument relevance interface 220, a document management interface 250,and a document image window 254. As seen in FIG. 2F, the researcher maystart a research project by entering one more concepts 272. Each concept272 may have one or more words or word groups associated therewith. Asshown in FIG. 2B, the document analysis GUI 200 includes a keyword entryinterface 210. The keyword entry interface 210 comprises multiple rowsof alphanumeric entry fields 212. One or more keywords 213 may beentered by a researcher into each entry field 212, wherein each keyword213 is conceptually related such that each line represents a keywordgroup 214. The researcher is also provided with a user thesaurus 211 andweb thesaurus 219. The user thesaurus 211 can be edited and stored inthe data storage element 1016, and the web thesaurus 219 may be accessedthrough the network 1020 by the interface module 1014. Five alphanumericentry fields 212 are shown to be filled in FIG. 2B. Each concept 272 andcorresponding keyword group 214 may be determined manually by theresearcher or may be received from an external source. By way ofexample, the concepts may be reduced to a manageable number of concepts(e.g. 4-5 concepts). Keywords 213 may then be chosen for each of theconcepts and entered into one of the alphanumeric fields 212 to form thekeyword group 214. After entering each of the desired concepts, theresearcher may then exit the keyword entry interface 210 and proceed toanalysis of a set of documents based on the user-defined concepts.

At a next step labeled as 1320 the interface module 1014 will receiveone or more reference documents 1032. As discussed the interface module1014 is configured to receive one or more documents 1032 from thedocument provider 1030 by way of network 1020. The interface module 1014may be configured to allow the researcher to request a predetermined setof documents 1032. By way of example, the researcher may initiate arequest for a specific set of patent documents or a set of patentdocuments that fall within a specific category or classification. Theresearcher may also initiate a search of a remote document repositorythrough a search interface window 230 (shown in FIG. 2D) provided by thedocument analysis GUI 200. The search may be initiated by entering a setof search parameters, such as keywords, into one or more search fields232 located on the search interface window 230. Boolean operators,wildcards and proximity indicators may be used to link the keywordstogether in logic sets. The search interface window 230 may also providea search assistance window 234 that allows the previously definedkeywords 213 to be added to the set of search parameters in response toa single user action (e.g. a mouse click). The search assistance window234 thereby facilitates the loading of search parameters into the one ormore search fields 232. In addition, the researcher is provided with aclassification search history 290, which contains a table fordocumenting the search project strategy (discussed in detail later). Theresearcher may pick classification codes from the classification searchhistory 290. As discussed, the interface module 1014 may alternativelybe configured to receive one or more documents 1032 through the user I/Ointerface 1018. In such an embodiment, the documents 1032 may be storedon a portable storage device (not shown) such as a CD, DVD or solidstate device and the user I/O interface 1018 may include acommunications interface such as a wireless interface, a CD/DVD drive ora USB drive for retrieving data from the personal storage device. Uponreceiving the one or more reference documents the interface module 1014will populate a document management table 252 located on a documentmanagement interface 250 (shown in FIG. 2E) with selectable rows 253each having information descriptive of one of the received documents1032. By way of example, each row may include a reference documentnumber 255 and document title 256.

At a next step labeled as 1330 the interface module 1014 receives andstores data from the researcher that indicates relevancy of a currentlyselected document 1032 to the one or more user-defined concepts. Asdiscussed, the interface module 1012 will populate the documentmanagement table 252 (shown in FIG. 2E) with selectable rows 253 eachhaving information descriptive of one of the received referencedocuments. In the exemplary embodiment, the document management table252 also includes one or more additional columns for allowing theresearcher to indicate (by way of a mouse-click or similar navigationevent) the relevance of the currently selected document. Each row of thedocument management table 252 may have a relevancy column 257 thatcontains an input field for indicating an overall relevance of theassociated reference document. By way of example the interface module1014 may provide the researcher with the ability to select an indicia(e.g. using a drop-down menu list) such as “A” for highest relevance,“B” for suspected relevance, and “C” for uncertain relevance. Irrelevantdocuments may be marked with an “I” to place a marker in a project file205 (FIG. 2H) indicating that a reference document was reviewed. Eachrow of the document management table 252 may also have one or moreadditional columns labeled generally as 258 that contain an input fieldfor indicating whether a specific concept has been verified to appear inthe currently selected reference document. The interface module 1014 mayprovide the researcher with the ability to toggle a field (one suchfield is labeled as 259) corresponding to a specific concept “on” or“off” (e.g. by a mouse-click) when indicating whether a particularconcept 272 does or does not exist inside the selected document. Acolumn may be provided for each of the previously discussed concepts272. As discussed, the interface module 1014 may provide the researcherwith a concept management window 270 (see FIG. 2F) for allowing theresearcher to define different concepts 272 which the additional columns258 may be derived from. In this manner, the researcher is able to trackhigher-level or more abstract concepts than were initially defined andmay also provide more user-friendly naming of the concepts 272 (usefulfor example for report generation). The interface module 1014 may alsostore the previously discussed relevancy indicators in the project file205, which is located in the data storage element 1016 in FIG. 1A. Bystoring each of the indicators, the interface module 1014 is able toprovide information to the classification analysis module 1012. Theclassification analysis module 1012 will now be discussed in greaterdetail.

At a next step labeled as 1340 classification analysis begins with theinterface module 1014 first displaying a classification analysisinterface 280, which is shown in FIG. 2G. The classification analysisinterface 280 can include a classification search history 290, which isretrieved by the interface module 1014 from the project file 205. Theclassification search history 290 shows a previously identifiedclassification code 291 and a corresponding previously identifiedclassification title 292. Each previously identified classification code291 also has a search extent indicator 294 and a search status indicator293, both of which can be manipulated by the researcher to variousstates. By way of example, if the researcher has already searched orplans to search previously identified classification code 291 in itsentirety, he or she may indicate this with the word “Yes” in the searchextent indicator 294. In addition, the researcher may keep record ofwhich previously identified classification codes 291 have been properlyaddressed with either text limited searching or full searching bysimilarly indicating in the search status indicator 293. Theclassification analysis interface 280 may include a document selectionfield 281 and a classification analysis mode selection field 282. Thedocument selection field 281 provides one or more options to theresearcher for selecting a set of documents which the classificationanalysis will be performed on. By way of example, the researcher mayselect all documents in the project file 205 that have previously beenindicated to be relevant to any of the concepts 272 (i.e. all documentsselected in any of columns 258), all documents relative to a specificconcept (i.e. all documents selected in one of columns 258) or documentsthat have been indicated to have a specific overall relevance (e.g. alldocuments having a relevancy of “A’ from relevancy column 257). Theclassification analysis interface 280 also has a class weighting 286option and a relevancy weighting 287 option. The class weighting 286instructs the classification analysis module 1012 to account for totalsize of a classification, which balances the effect of largeclassifications overshadowing smaller classifications in un-weightedfrequency counts. The relevancy weighting 287 allows the researcher toassign greater weight in the scoring to documents 1032 of higherrelevance recorded in the relevancy column 257. The classificationanalysis mode selection field 282 provides one or more options to theresearcher for selecting the mode of classification analysis to beperformed. The most common mode is the Subclass mode which is discussedin the next step. (Detailed discussions all four modes are foundimmediately following.)

Step 1340 may proceed after the researcher confirms the previouslydescribed classification analysis options. The interface module 1014then instructs the classification analysis module 1012 to performclassification analysis on the selected set of documents. Referring backto FIG. 1B, documents 1032 have one or more document classifications 135associated therewith, which can be further divided into a class 136 anda subclass 137. The classification analysis module 1012 will retrievethe document classifications 135 from each document and then generate acount of instances of each document classifications 135 over the entireselected set. The classification analysis module 1012 will then sendeach document classification 135 and its corresponding count or score tothe interface module 1012 to be displayed (step 1350) via theclassification analysis interface 280 where each unique code will bedisplayed in a separate row. The unique codes may be displayed in aclassification code column 284 while the corresponding score will bedisplayed in a classification score column 283. The rows may be sortedbased on the score of each unique code. In an alternative embodimentdiscussed later, the score for each code may be multiplied by aweighting factor that accounts for the size of each subclass (ie thenumber of documents in the subclass) or by a weighting factor thataccounts for the document relevance. The interface module 1014 may alsoretrieve a classification description for each unique code from theclassification data provider 1040, using each unique classification codeto look up the corresponding classification code entry 1044. Theclassification description may also be displayed in a classificationtitle column 285 of the classification analysis interface 280. Theclassification analysis module 1012 will use a search indicator tosensory indicator table 241, as seen in FIG. 3D, to determine a sensoryindicator (e.g. a color) for each unique classification code thatappears in the classification analysis interface. The classificationanalysis module 1012 determines the sensory indicator by firstdetermining whether the corresponding classification code has beenpreviously searched and to what extent. If a code appears in theclassification code column 284, and does not appear as a previouslyidentified classification code 291 in the classification search history290, then the code is assumed to be unsearched. If a code appears in theclassification code column 284, and also appears as a previouslyidentified classification code 291, and the corresponding search statusindicator 293 shows “No”, then the code is assumed to be at partiallysearched. If a code appears in the classification code column 284, andappears as a previously identified classification code 291, and thecorresponding search status indicator 293 shows “Yes”, then the code isassumed to be fully searched. The sensory indicator may be a greenhighlighting if the code is unsearched, a yellow highlighting if it hasbeen partially searched, or a red highlighting if the code has beenfully searched. The classification analysis window 280 thus allows theresearcher to quickly determine (e.g. by visual inspection) whichclassification codes have been cited most frequently as well as whichclassification codes have not yet been searched. In this manner theresearcher may very quickly determine where a next iteration of a searchproject should be directed.

At step 1350 the researcher will determine whether to add a newclassification code to the search project. The researcher is providedthe ability to quickly add entries to the classification search history290 directly from the classification code column 284 using a mouseclick. In doing so, the process will return to step 1320, as indicatedby dashed arrow 1360, at which point the interface module 1014 providesa new search inquiry to the document provider 1030 and a new set ofreference documents 1032 will be received. Each of steps 1330 through1350 are repeated to determine the relevancy of the new set of referencedocuments to the user-defined concepts and whether the search should beexpanded to a new classification. Steps 1320 through 1350 may berepeated until the researcher is satisfied that the most relevantclasses have been searched. By way of example, the researcher may makethis determination when a threshold number of the most frequentlyoccurring classifications are highlighted in red, which indicates thatall are present on the classification search history 290, and all areindicated as complete by the search status indicator 293. By way ofexample, the threshold may be least ten red highlighted classificationsin the classification analysis interface 280.

Modes of Operation: As discussed the classification analysis performedby the classification module 1012 may be performed by first specifying amode using the classification analysis mode selection field 282. By wayof example, the classification analysis modes may include: a MainClasses mode, a Subclass Parents mode, a Subclass Mode and a PrimarySubclass mode. Referring to FIG. 3B, all four modes are shown, and willnow be discussed in detail. In addition, FIG. 3C shows the process ofFIG. 3B along with actual numbers. Steps 701-706 are run in all modes,and will be discussed first.

As seen at step 701, the classification analysis module 1012 retrievesthe documents 1032 from the project file 205. The documents are thenfiltered according to the preference of the researcher using documentselection field 281. As an example, the researcher may run just “B”tagged documents or just documents having a specific element tagged inthe document management table 252. Next at step 702, the classificationanalysis module 1012 compiles all document classifications 135 into a2D-Array 750 containing document classification 135, relevancy, score,and primary (see for example array 750 in FIG. 3C). The relevancy isoriginally set by the researcher in relevancy column 257 as A,B,C,D, orE. Score is initially set to 1. Primary is an indication as to whetherthe document classification 135 is the first listed. Next at step 703,if the class weighting 286 is turned on, then move to step 704. At step704, the interface module 1014 requests the classification size (ie. thetotal number of documents currently classified therein) for eachclassification in the 2D-Array 750 from the classification data provider1040. Next the classification analysis module 1012 divides the score in2D-Array 750 by the classification size, which effectively weights eachclassification inversely according to classification size. Next at step705, if the relevancy weighting 287 is turned on, then move to step 706.At step 706, the classification analysis module 1012 multiplies thescore in 2D-Array 750 by a relevancy factor according to the relevancylisted in 2D-Array 750. Current relevancy factors areA=1.5,B=1,C=0.75,D=0.5,E=0.5.

Main Classes Mode: If classification analysis mode selection field 282is set to “Main” then proceed through step 717 to step 718. At step 718,the document classifications 135 in the 2D-Array 750, are rewritten toshow only the classes 136. Next, at step 718, the 2D-Array 750 isrearranged by summing the scores of repeat classification entries andeliminating all repeats. The 2D-Array 750 is then sorted high to lowaccording to score, and the class description is added for step 720,which is the display in interface 280. See FIG. 2 i for an example ofthe interface 280 after a run in Main Classes mode.

SubClass Parents Mode: If classification analysis mode selection field282 is set to “Subclass Parents” then proceed through step 714 and on tostep 715. Next, the classification analysis module 1012 requests allancestors of the document classifications 135 in the 2D-Array 750 fromthe classification data provider 1040 via the interface module 1014. Theancestors are then inserted into the 2D-Array 750, and simultaneouslythe original document classifications are deleted from the 2D-Array 750.Next, at step 716, the 2D-Array 750 is rearranged by summing the scoresof repeat classification entries and eliminating all repeats. Theresulting table is displayed in the classification analysis interface280. See FIG. 2J for an example of the interface 280 after a run inSubClass Parents mode.

SubClass Mode: If classification analysis mode selection field 282 isset to “Subclass” then proceed through step 710 and on to step 711.Next, the classification analysis module 1012 rearranges the previouslygenerated 2D-Array 750 by summing the scores and eliminating repeats.The resulting 2D-Array 750 is sorted according to score from high tolow. Next, at step 712, the classification analysis module 1012 comparesall rows in 2D-Array 750 to all rows of the classification searchhistory 290, and assigns colors according to the following scheme (seealso FIG. 3D for the scheme): 1) if a classification is in 2D-Array 750and is not in the classification search history 290 then assign green,2) if a classification is in 2D-Array 750 and is in the classificationsearch history 290 with a search status 293 of “No” and a search extent294 of “No” then assign light yellow, 3) if a classification is in2D-Array 750 and is in the classification search history 290 with asearch status 293 of “Yes” then assign red. 2) if a classification is in2D-Array 750 and is in the classification search history 290 with asearch status 293 of “No” and a search extent 294 of “Yes” then assignbright yellow. At step 720, the resulting table is displayed along withthe color scheme in the classification analysis interface 280. See FIG.2G for an example of the interface 280 after a run in SubClass mode.

Primary Mode: If classification analysis mode selection field 282 is setto “Primary” then proceed through step 707 and on to step 708. Next, theclassification analysis module 1012 sorts through 2D-Array 750 andremoves all but the entries labeled as primary. At step 720, theresulting table is displayed in the classification analysis interface280.

Referring to FIG. 4, a block diagram is shown illustrating a documentanalysis system 800 in accordance with another exemplary embodiment ofthe invention. The document analysis system 800 is similar to thedocument analysis system of FIG. 1A however provides a client-serverarchitecture. Accordingly, document analysis system 800 includes aclient device 810 and a server device 880. The server device 880 may bea computing device having a processor such as personal computer or maybe implemented on a high performance server, such as a HP, IBM or Suncomputer using an operating system such as, but not limited to, Windows,Solaris or UNIX. The server device 880 includes a classificationanalysis module similar in function to the document analysis module of1012 of the embodiment of FIG. 1A.

Thus, a document analysis system having the benefits of allowing forefficient and accurate identification of potentially relevantclassifications is contemplated. Referring now to FIG. 5, an exemplarymethod 2100 of performing a patent search using multiple modes of thepresent invention comprised of the following:

Step 2101: Synthesizing a proposition into one or more key concepts 272;

Step 2102: Developing one or more keyword groups 214 based on the keyconcepts 272;

Step 2103: Conducting a text search with text search inquiry over adatabase of documents having text, images and one or more documentclassifications 135 therein using the keyword groups 214;

Step 2104: Compiling a search file of documents 1032 from the textsearch inquiry;

Step 2105: Selecting a first set of documents from the file of documents1032 and creating a project file 205;

Step 2106: Tagging documents 1032 in the project file 205 using adocument management interface 250, with indicia in a relevancy column257 and concepts 272 in additional columns 258;

Step 2107: Instructing a classification analysis module 1012 to run inMain Class Mode to locate a set of classes 136 by counting and rankingaccording to frequency;

Step 2108: Conducting a first class & text search over the databaseusing the top-ranked classes 136 combined with text from the keywordgroups 214;

Step 2109: Compiling a second search file of documents 1032 from theclassification & text search;

Step 2110: Selecting a second set of 4-5 and appending the set to theproject file 205;

Step 2111: Tagging untagged documents in the project file 250 asappropriate, and particularly the second set of documents, using adocument management interface 250, with indicia in a relevancy column257 and concepts 272 in additional columns 258;

Step 2112: Instructing the classification analysis module 1012 to run inSubclass Parents Mode to locate a second set of document classifications135 by counting and ranking according to frequency;

Step 2113: Inspecting a classification schedule to locate potentiallyrelevant child classifications of the second set located in step 2112and adding said classifications to the classification search history290;

Step 2114: Conducting a third classification & text search over thedatabase using the classifications from 2113 combined with text from thekeyword groups 214;

Step 2115: Compiling a third search file of documents 1032 from thethird classification & text search;

Step 2116: Selecting a third set of 4-5 documents 1032 and appending theset to the project file 205;

Step 2117: Tagging untagged documents in the project file 250 asappropriate, and particularly the third set of documents, using adocument management interface 250, with indicia in a relevancy column257 and concepts 272 in additional columns 258;

Step 2118: Instructing the classification analysis module 1012 to run inSubclass Mode by counting and ranking document classifications 135according to frequency and cross referencing results against theclassification search history 290 to locate an nth documentclassification 135 to add to the classification search history 290;

Step 2119: Conducting an nth search over the database using the nthclassification from step 2118 either combined with text from the keywordgroups 214 or inspecting the nth classification in its entirety;

Step 2120: Compiling an nth search file of documents 1032 from the nthclassification & text search;

Step 2121: Selecting all relevant documents 1032 and appending the setto the project file 205;

Step 2122: Tagging untagged documents in the project file 250 asappropriate, and particularly the nth set of documents, using a documentmanagement interface 250, with indicia in a relevancy column 257 andconcepts 272 in additional columns 258;

Step 2123: Inspecting the classification search history 290 for minimumof ten document classification codes and optionally repeating from 2118to 2123;

Step 2124: Conducting forward and backward citation search (not shown)on the selected high-relevance documents from the project file 205 andadding relevant documents to the project file;

Step 2125: End.

While the foregoing invention has been described with reference to theabove-described embodiments, various modifications and changes can bemade without departing from the spirit of the invention. Accordingly,all such modifications and changes are considered to be within the scopeof the appended claims.

1. A search system for searching through a plurality of documents thatare organized using a classification system to define each of theplurality of documents as a classified document, wherein a search isconducted based on a predetermined subject matter, the systemcomprising: a program module stored on at least one of a computerreadable medium and a memory of a computer, the program modulecomprising instructions executable by a processor of the computer todetermine document classifications that are relevant to the subjectmatter of the search, the program module comprising a classificationanalysis module; wherein the classification analysis module: receives aset of documents, the set of documents including at least one document,each document in the set of documents having a relevancy indicator andat least one classification value, each classification value beingdefined as a unique classification value; determines a score of each ofthe unique classification values appearing in the at least one documentin the set of documents, the score being defined as a frequency ofoccurrence of each of the unique classification values appearing in theat least one document; determines a search indicator for each of theunique classification values, the search indicator providing anindication of a level to which each of the unique classification valueshas been previously searched; and generates and displays a table of eachof the unique classification values along with at least one of the scoreof each of the unique classification values and the search indicator foreach of the unique classification values.
 2. A system according to claim1 wherein the table is sorted based on the score of each of the uniqueclassification values.
 3. A system according to claim 1 wherein each ofthe unique classification values is assigned a predetermined value thatcorresponds to a weight of each of the unique classification values todefine a weighted classification value, and wherein the weightedclassification value is used to modify the score.
 4. A system accordingto claim 1 wherein each of the unique classification values relating toa document located in the search that is determined to be a relevantdocument is assigned a predetermined value that corresponds to a weightof each of the unique classification values to define a weightedclassification value, wherein the predetermined value is derived fromthe overall relevance of the document located in the search, and whereinthe weighted classification value is used to modify the score.
 5. Asystem according to claim 1 wherein each of the unique classificationvalues relating to a document located in the search is assigned apredetermined value that corresponds to a weight of each of the uniqueclassification values to define a weighted classification value based onthe number of documents located in the classification, and wherein theweighted classification value is used to modify the score.
 6. A systemaccording to claim 1 wherein the classification analysis moduleseparates the unique classification values to display only the uniqueclassification values of those documents located in the search that weredetermined to be relevant.
 7. A system according to claim 1 wherein eachof the unique classification values are organized in a hierarchyproviding each of the unique classification values with at least oneancestor node; and wherein each of the unique classification values isreplaced with the at least one ancestor node.
 8. A system according toclaim 1 wherein each of the unique classification values includes aclass value and a subclass value; and wherein each of the uniqueclassification values is replaced with the class value.
 9. A systemaccording to claim 1 wherein the classification analysis moduledetermines the search indicator for each unique classification value byreceiving both an alphanumeric indicator relating to a search status ofthe unique classification value and an alphanumeric indicator relatingto a search extent of the unique classification value.
 10. A systemaccording to claim 1 wherein the classification analysis module assignsa color to be displayed on a user interface relating to the searchindicator.
 11. A system according to claim 1 wherein the computer is aserver and the system further comprises a client computer, the servercommunicatively coupled to the client computer; and wherein the programmodule is located on the client computer and the classification analysismodule is located on the server.
 12. A system according to claim 7wherein each of the unique classification values are grouped by addingthe scores of each of the unique classification values after beingreplaced; wherein the grouped unique classification values are sortedaccording to the scores; and wherein the sorted grouped uniqueclassification values are displayed on the table.
 13. A method ofsearching through a plurality of documents that are organized using aclassification system to define each of the plurality of documents as aclassified document, the plurality of classified documents beingsearched based on a predetermined subject matter, the method comprising:determining record classifications that are relevant to the subjectmatter of the search using a program module stored on at least one of acomputer readable medium and a memory of a computer, the program modulecomprising instructions executable by a processor of the computer todetermine document classifications; receiving a set of documents, theset of documents including at least one document, each document in theset of documents having a relevancy indicator and at least oneclassification value, each classification value being defined as aunique classification value; determining a score of each of the uniqueclassification values appearing in the at least one document in the setof documents, the score being defined as a frequency of occurrence ofeach of the unique classification values appearing in the at least onedocument; determining a search indicator for each of the uniqueclassification values, the search indicator providing an indication of alevel to which each of the unique classification values has beenpreviously searched; and generating and displaying a table of each ofthe unique classification values along with at least one of the score ofeach of the unique classification values and the search indicator foreach of the unique classification values.
 14. A method according toclaim 13 further comprising sorting the table based on the score of eachof the unique classification values.
 15. A method according to claim 13further comprising assigning a predetermined value to each of the uniqueclassification values that corresponds to a weight of each of the uniqueclassification values to define a weighted classification value; andwherein the weighted classification value is used to modify the score.16. A method according to claim 13 further comprising assigning apredetermined value to each of the unique classification values relatingto a document located in the search that is determined to be a relevantdocument that corresponds to a weight of each of the uniqueclassification values to define a weighted classification value, andwherein the weighted classification value is used to modify the score.17. A method according to claim 13 further comprising assigning apredetermined value to each of the unique classification values relatingto a document located in the search that corresponds to a weight of eachof the unique classification values to define a weighted classificationvalue based on the number of documents located in the classification,and wherein the weighted classification value is used to modify thescore.
 18. A method according to claim 13 further comprising separatingthe unique classification values to display only the uniqueclassification values of those documents located in the search that weredetermined to be relevant.
 19. A method according to claim 13 furthercomprising organizing each of the unique classification values in ahierarchy providing each of the unique classification values with atleast one ancestor node; and further comprising replacing each of theunique classification values with the at least one ancestor node.
 20. Amethod according to claim 13 wherein each of the unique classificationvalues includes a class value and a subclass value; and furthercomprising replacing each of the unique classification values with theclass value.
 21. A method according to claim 13 further comprisingdetermining the search indicator for each unique classification value byreceiving both an alphanumeric indicator relating to a search status ofthe unique classification value and an alphanumeric indicator relatingto a search extend of the unique classification value.
 22. A methodaccording to claim 13 further comprising assigning a color to bedisplayed on a user interface relating to the search indicator.
 23. Amethod according to claim 19 further comprising grouping each of theunique classification values by adding the scores of each of the uniqueclassification values after being replaced; sorting the grouped uniqueclassification values according to the scores; and displaying the sortedgrouped unique classification values on the table.
 24. A method ofsearching through a plurality of documents that are organized using aclassification system to define each of the plurality of documents as aclassified document, the plurality of classified documents beingsearched based on a predetermined subject matter, the method comprising:determining record classifications that are relevant to the subjectmatter of the search using a program module stored on at least one of acomputer readable medium and a memory of a computer, the program modulecomprising instructions executable by a processor of the computer todetermine document classifications; receiving a set of documents, theset of documents including at least one document, each document in theset of documents having a relevancy indicator and at least oneclassification value, each classification value being defined as aunique classification value; determining a score of each of the uniqueclassification values appearing in the at least one document in the setof documents, the score being defined as a frequency of occurrence ofeach of the unique classification values appearing in the at least onedocument; determining a search indicator for each of the uniqueclassification values, the search indicator providing an indication of alevel to which each of the unique classification values has beenpreviously searched, wherein the search indicator is determined byreceiving both an alphanumeric indicator relating to a search status ofthe unique classification value and an alphanumeric indicator relatingto a search extend of the unique classification value; generating anddisplaying a table of each of the unique classification values alongwith at least one of the score of each of the unique classificationvalues and the search indicator for each of the unique classificationvalues; and assigning a color to be displayed on a user interfacerelating to the search indicator.
 25. A method according to claim 24further comprising sorting the table based on the score of each of theunique classification values.
 26. A method according to claim 24 furthercomprising assigning a predetermined value to each of the uniqueclassification values that corresponds to a weight of each of the uniqueclassification values to define a weighted classification value; andwherein the weighted classification value is used to modify the score.27. A method according to claim 24 further comprising assigning apredetermined value to each of the unique classification values relatingto a document located in the search that is determined to be a relevantdocument that corresponds to a weight of each of the uniqueclassification values to define a weighted classification value, andwherein the weighted classification value is used to modify the score.28. A method according to claim 24 further comprising assigning apredetermined value to each of the unique classification values relatingto a document located in the search that corresponds to a weight of eachof the unique classification values to define a weighted classificationvalue based on the number of documents located in the classification,and wherein the weighted classification value is used to modify thescore.
 29. A method according to claim 24 further comprising separatingthe unique classification values to display only the uniqueclassification values of those documents located in the search that weredetermined to be relevant.
 30. A method according to claim 24 furthercomprising organizing each of the unique classification values in ahierarchy providing each of the unique classification values with atleast one ancestor node; and further comprising replacing each of theunique classification values with the at least one ancestor node.
 31. Amethod according to claim 24 wherein each of the unique classificationvalues includes a class value and a subclass value; and furthercomprising replacing each of the unique classification values with theclass value.
 32. A method according to claim 30 further comprisinggrouping each of the unique classification values by adding the scoresof each of the unique classification values after being replaced;sorting the grouped unique classification values according to thescores; and displaying the sorted grouped unique classification valueson the table.
 33. A system according to claim 8 wherein each of theunique classification values are grouped by adding the scores of each ofthe unique classification values after being replaced; wherein thegrouped unique classification values are sorted according to the scores;and wherein the sorted grouped unique classification values aredisplayed on the table.
 34. A method according to claim 20 furthercomprising grouping each of the unique classification values by addingthe scores of each of the unique classification values after beingreplaced; sorting the grouped unique classification values according tothe scores; and displaying the sorted grouped unique classificationvalues on the table.
 35. A method according to claim 31 furthercomprising grouping each of the unique classification values by addingthe scores of each of the unique classification values after beingreplaced; sorting the grouped unique classification values according tothe scores; and displaying the sorted grouped unique classificationvalues on the table.